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Abstract 

In this work we address the problem of bhndly reconstructing compressively sensed signals by ex- 
ploiting the co-sparse analysis model. In the analysis model it is assumed that a signal multiplied by an 
analysis operator results in a sparse vector. We propose an algorithm that learns the operator adaptively 
during the reconstruction process. The arising optimization problem is tackled via a geometric conju- 
gate gradient approach. Different types of sampling noise are handled by simply exchanging the data 
fidelity term. Numerical experiments are performed for measurements corrupted with Gaussian as well 
as impulsive noise to show the effectiveness of our method. 

Index Terms 

Analysis Operator Learning, Blind Compressive Sensing, Optimization on Matrix Manifolds. 



1 Introduction 

1.1 Regularization in Compressive Sensing 

In recent years, Compressive Sensing (CS) has influenced many fields in signal processing. Basically, the 
theory states that if an unknovi^n signal s G M" can be sparsely represented, only a iew m < n linear and non- 
adaptive measurements y G M™ of the signal suffice to accurately reconstruct it. Denoting the measurement 
vectors by {(f>.i G K."}^j^, the measurement process can be compactly written as 

y = [(^^, . . . ,0„i]^s + z = *s + z, (1) 

where $ G M'"^" is the measurement matrix, and z G M'" constitutes possible sampling errors. Due to the 
reduced dimensionality, reconstructing s from the measurements is ill-posed in general, and cannot be done 
by simply inverting However, additional model assumptions on s may help to find a solution. In this 
context, the sparse synthesis- approach and the co-sparse analysis-approach [1] have proven extremely useful. 
In the sparse synthesis approach it is assumed that a signal can be decomposed into a linear combination of 
only a few columns, called atoms, of a known dictionary V G M."-^"^ with d > n, i.e. s = Vx with x G M'' 
being the sparse coefficient vector. Many algorithms for solving the synthesis problem exist, cf. [2 for an 
extensive overview. 

The co-sparse analysis approach is a similar looking but yet very different alternative to tackle the CS 
problem. Its underlying assumption is that a signal multiplied by an analysis operator fl G M*^^" with k > n 
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results in a sparse vector fls G R*^. If 5: R*^ — ^ R denotes a function that measures sparsity, the analysis 
model assumption is exploited via 

s* = argmin g{fls) s.t. ||$s-y|l2<e. (2) 

sGR" 

The analysis model has proven useful in the field of image reconstruction and we thus restrict ourselves to 
compressively sensed images here. Our approach is motivated by the observation that learning the operator 
leads to an improved image reconstruction quality [3], [3] compared to applying a finite difference operator 
that approximates the image gradient, known as Total Variation (TV- norm) regularization [5 , [6 . In contrast 
to the task of dictionary learning only a few analysis operator learning algorithms have been proposed in the 
literature so far, cf. [7], [5], [1], [5]. Furthermore, from image denoising it is known that the reconstruction 
accuracy can be further improved when the dictionary or operator is not only learned on some general and 
representative training set, but rather directly on the specific signal that has to be reconstructed [S], [3]- 
These observations prompted us to combine the image reconstruction performance of the analysis approach 
together with the accuracy improvement capabilities of a learned operator. 

1.2 Blind Compressive Sensing 

The principle of CS relies on the fact that the signal s has a sparse representation in a given basis or 
dictionary V that is universal for the considered signal class of interest. However, such universal dictionaries 
do not necessarily result in the sparsest possible representation, which is crucial for the recovery success. 
Due to this, in [TD] the concept of Blind Compressive Sensing (BCS) has been introduced, which aims at 
simultaneously learning the dictionary and reconstructing the signal, see also [TT] and |T2] for an extension 
of this idea. Note that all these methods are based on the synthesis model and consider the problem of 
finding a suitable dictionary, while in this paper we focus on the analysis model. 

1.3 Our Contribution 

In this work we address the problem of signal reconstruction from compressively sensed data regularized by 
an adaptively learned analysis operator. The work of Hawe et al. [3], which focuses on learning a global 
patch based analysis operator from noise free training samples, has already shown the superior performance 
of a learned operator compared to state-of-the-art analysis and synthesis based regularization, like e.g. K- 
SVD denoising, in the context of classical image reconstruction problems. That is why we extend this idea 
and build on their work to utilize the learning process to obtain a signal dependent regularization of the 
inverse problem. Since we are dealing with compressive measurements, our approach can be interpreted as 
an analysis-based BCS problem with no prior knowledge about the operator. 

We extend the algorithm proposed in [3], where the operator is learned by a geometric Conjugate Gradient 
(CG) method on the so-called oblique manifold, to our setting of simultaneous image reconstruction and 
operator learning. This approach allows us to compensate for various sampling noise models, i.e. Gaussian or 
impulsive noise, by simply exchanging the data fidelity term. To summarize, the advantages of our approach 
are as follows: (i) The learning process allows to adaptively find an adequate operator that fits the underlying 
image structure, (ii) There is no necessity to train the operator prior to the reconstruction, (iii) Different 
noise types are handled by simply exchanging the data fidelity term. 

2 Problem Statement 

Our goal is to find a local analysis operator G R'^^" with k > n simultaneously to the signal s £ R^ that 
has to be reconstructed from the compressive measurements. Here, the vector s denotes a vectorized image 
of dimension N = wh, with w being the width and h being the height of the image, respectively, obtained 
by stacking the columns of the image above each other. Note that the analysis operator has to be applied 
to local image patches rather than to the whole image. We denote the binary {n x N) matrix that extracts 
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the patch centered at the (r, c) pixel by Vrc- Furthermore, practice has shown that the learning process is 
significantly faster if centered^ i.e. zero mean patches are considered. This can be easily incorporated by 
multiplying the vectorized patch with M := {Inxn — ^iJnxn)j where I and J are the identity operator and 
the matrix with all elements equal to one, respectively. We employ constant padding at the image borders, 
i.e. replicating the boundary pixel values. In the end, we globally promote sparsity with an appropriate 
function g{-) : R*^ — >■ ffi. and write for the problem of finding a suitable analysis operator 

n* = aigminy^ g{nMVrcsf, (3) 

where C denotes an admissible set, which implies some constraints on H to avoid trivial solutions. We follow 
the considerations of the authors in [3], demanding that: 

(i) The rows of 17 have unit Euclidean norm, i.e. ||t<Ji||2 = 1, for i = l,...,fc, where u)i denotes the 
transposed of the i*^-vow of O. 

(ii) The analysis operator $7 has full rank, i.e. rk(r2) = n. 

(iii) The mutual coherence of the analysis operator should be moderate. 

These constraints motivate to consider the set of full rank matrices with normalized columns, which admits 
a manifold structure known as the oblique manifold 

OB(n, k) := {X G M"'"' | rk(A') = n, ddiag(A'^ A') = Z^}. (4) 

Here, ddiag(V) is the diagonal matrix whose entries on the diagonal are those of V. Since we require the 
rows of Jl to have unit Euclidean norm, we restrict to be an element of OB(n, fc). To enforce the rank 
constraint (ii) we employ the penalty function 

Furthermore, the mutual coherence of the analysis operator, formulated in constraint (iii), can be controlled 
via the logarithmic barrier function of the atoms' scalar products, namely 

r{n):=- log(l - (^Z'^,)'). (6) 

\<i<j<k 

Considerations concerning the usefulness of these penalty functions can be found in To measure the 
sparsity of the analyzed patches, we use the differentiable sparsity promoting function 

5(w)=5]log(l + c-(eJw)2), (7) 

where c is a positive constant and represents the j*'' standard basis vector with the same length as w. 

Since we are interested in simultaneous operator learning and image reconstruction, we further introduce 
a data term p(-), which measures the fidelity of the reconstructed signal to the measurements y £ R*^. The 
choice of p(-) depends on the error model, i.e. by using p(-) = || • Hi the error is assumed to be Gaussian 
distributed. If the noise is sparsely distributed over the measurements, we set p(-) — g{-). This error model 
has also been utilized in ^13j to compensate for sparse outliers in the measurements. 

Finally, combining the data term with the constraints and the sparsity promoting function g, the aug- 
mented Lagrangian optimization problem for adaptively learning the analysis operator with simultaneous 
image reconstruction consists of minimizing the cost 

/(f2\s)= 2^;^ g(flX7',es)2+ ?7p(*s-y)+ 7/1(11) + «;r(n), (8) 

(r,c) 

subject to G OB(n, fc) with the measurement matrix $ G R^^^^. The scalar B denotes the number of 
extracted image patches. The parameter rj G R+ weights the fidelity of the solution to the measurements 
and the parameters 7, k G R^ control the infiuence of the two constraints. 
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3 Optimization Algorithm 



Since the cost function ([8]) is restricted to a smooth manifold, we follow [3] and employ a conjugate gradient 
on manifolds approach to solve the optimization problem. The CG approach is scalable and converges fast in 
practice. It is thus well-suited to handle the high dimensional problem of simultaneous image reconstruction 
and operator learning. The challenges for developing the CG method are the efficient computation of the 
Riemannian gradient, the step-size and the update directions. To that end, we employ the product manifold 
structure of OB(n, k) x considered as a Riemannian submanifold of R"^'^ x M^. To enhance legibility in 
the remainder of this section we denote the oblique manifold by OB. We further denote the tangent space 
at a point fl^ = X ^ OB as T^'OB, with S G T^OB being a tangent vector at X. 

The Riemannian gradient at X is given by the orthogonal projection of the standard (Euclidean) gradient 
onto the tangent space r;fOB. The orthogonal projection of a matrix Q G R"^''' onto the tangent space 
TxOB is obtained by ^TxOb{Q) = Q — Xddia.g{X^ Q). Using the product structure and denoting the partial 
derivatives of / by Vs/(<-f,s) and Vxf{X,s), respectively, the Riemannian gradient of the cost function is 

g{X, S) = (nT^OB(V;t/), vj) . (9) 

In CG methods the updated search directions (H^'+i', h(*+i)) e r;t'('+i)OB x R^ are linear combinations 
of the respective gradient and the previous search directions ("H^'^ h^^'^) g T;^.(i) OB x R^. The identification 
of different tangent spaces is done by the so-called parallel transport Ti^'^^-* :— T(5, "H^*), a*-'-*), which 
transports a tangent vector S along a geodesic to the tangent space r;^-(i+i)OB. In the manifold setting 
geodesies can be considered as the generalization of straight lines. We denote the geodesic from X^^^^ along 
the direction H^*-* as T{X'^^\'H'^^\t). Regarding the product manifold the new iterates are computed by 

^ (^r(A'«,-H«,aW),s« -f a«hW) , (10) 

where a'^^^ denotes the step size that leads to a sufficient decrease of the cost function. The parallel transport 
along the geodesies in the product manifold is then given by 

We use a hybridization of the Hestenes-Stiefel (HS) and the Dai Yuan (DY) formula as motivated in 
[l4] to determine the update of the search direction. With the shorthand notations :— Q{X'-'^'>) and 
g(^) := a(s(^)), as wen as U'^'+^'> = 0'^'+^'' - T^}^^^ and u(*+i) = g('+i) - g(*) the manifold adaptions of these 
formulas are 

^ (g(^+i),^y(m)) + (g(m)^ 

(r^;+'\i^('+i)) + (hw,u(«+i)) ' 
^ (g(-+i),g('+i)) + (g(»+i),g(-+i)) 
(r^;^'',w(»+i)) + (hw,u(»+i)) ' 

where (•, •) denotes the standard inner product in the respective Euclidean spaces. With the hybrid update 
formula 

= max (O, min(/3{;)., ^l)) , (14) 

the new search directions are given by 

(^(^+i)^h(^4-i)) ^ (-e(A'(^+i),s(^+i)) . (15) 

In our implementation we use the well-known backtracking line search which is adapted to the manifold 
setting until the Armijo condition is met. We name our method Analysis Blind Compressive Sensing (ABCS) 
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(a) ABCS 



(b) NESTA+TV 



Figure 1: Reconstruction of the Barbara image from M = N/4 measurements corrupted by additive white Gaussian noise with 

^noisc 5.1. 

and briefly summarize the whole procedure in Algorithm 1. For further details concerning CG- methods on 
the oblique manifold the reader is referred to [3], [T3], and |16) . 

Algorithm 1 ABCS 

Input: Initial operator i~linit, noisy measurements y, measurement matrix parameters ^^K,ri^c 
Set: i <- 0, s(") ^ *Ty, ^ O^^,, T^W ^ -^W, ^ -g(") 
1: repeat 

2: perform backtracking line search to get step size a*-*-* 
3: update to {X('+^\s^'+^^), cf. ^ 
4: compute e?(A'(^+i),s('+i)) 
5: compute /3[y^^, cf. ^ 

6: compute new CG-search directions ('H*-*'''^'', h(*+^)) 
7: i = i + l 

8: until II A"'*) - A'^'^^^IIf < 10"^ V i = maximum # of iterations 
Output: n* ^ A-WT^ s* ^ sW 



4 Experimental Results 

To measure the image reconstruction accuracy we use the peak signal-to-noise ratio 
{PSNR) = 101og(2552iV/X;^i(si - and the Mean Structural SIMilarity Index (MSSIM), with the 

same set of parameters as originally suggested and implemented in [17| . Throughout our experiments we use 
a patch size of (7x7), i.e. n = 49 and set k = 2n, as larger values of k do not enhance the reconstruction 
quality. We initialized ^init to be a random matrix and normalized the rows to unit norm. With this 
initialization, convergence to a local minimum was observed in all our experiments. The parameters for the 
constraints are set to 7 = 20 and k = 1000. The constant c in the sparsity inducing function ([7]) is chosen 
as c = 10"'. The parameter ry takes into account the size of the image as well as the operator size and reads 
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Table 1: Image reconstruction from measurements corrupted by additive white Gaussian noise with standard deviation (Tnoii 
The measurement rates are M = N/A (top) and M = N/W (bottom). Achieved PSNR in decibels and MSSIM. 







Girl 


Barbara 


Texture 


Method 




PSNR 


MSSIM 


PSNR 


MSSIM 


PSNR 


MSSIM 


NESTA 


0.1 


31.97 


0.794 


25.03 


0.686 


26.92 


0.732 


+TV 


5.1 


30.97 


0.754 


24.71 


0.676 


26.53 


0.717 




10.2 


29.72 


0.701 


24.01 


0.641 


25.66 


0.668 


ABCS 


0.1 


32.38 


0.806 


32.10 


0.895 


28.33 


0.807 




5.1 


31.36 


0.767 


29.79 


0.847 


27.73 


0.779 




10.2 


29.94 


0.708 


27.39 


0.766 


26.59 


0.731 


NESTA 


0.1 


29.51 


0.690 


22.59 


0.560 


23.87 


0.544 


+TV 


5.1 


28.95 


0.667 


22.56 


0.576 


23.72 


0.539 




10.2 


28.06 


0.632 


22.31 


0.559 


23.32 


0.522 


ABCS 


0.1 


29.98 


0.711 


24.60 


0.651 


25.06 


0.644 




5.1 


29.29 


0.684 


23.31 


0.587 


24.84 


0.625 




10.2 


28.31 


0.645 


22.79 


0.557 


24.15 


0.590 



rj = f] ■ {^) , with t) adjusted according to the noise level as explained below and a normalization factor 

r = 

256 ■ r-l 

We evaluate our method on the three images Girl (256 x 256), Barbara (512 x 512), and Texturei^ 
(256 X 256). The measurements are obtained by using the real valued noiselet transformation proposed in 

In the first experiment we show the robustness of the ABCS algorithm to sampling noise which follows 
a Gaussian distribution. For this purpose, the measurements have been artificially corrupted by additive 
white Gaussian noise with standard deviation dnoise- The data term in ^ reads p(-) = || • |||. We assume 
the noise level (Tnoise to be known and set r) = . Two measurement rates M — N/A and M — N/10 are 
considered. Table [T] shows the reconstruction performance for different noise levels. For comparison we used 
the algorithm of [18^ (NESTA), with TV- norm regularization and optimized parameters. Figure [T] shows the 
reconstructed images from M — N/A measurements and a noise level of (Tnoise = 5.1. We also tested the 
algorithm proposed in [19] (TVAL3) with different parameters, which achieves results comparable to NESTA. 
Due to space limitations, detailed results are not listed here. In all settings, the same measurements are 
used. 

To handle measurements that are corrupted by impulsive noise, we exchange the data fidelity function 
in ([S]) to p(-) = (?(•) in our second experiment. Corrupted coefficients are set to a value of ± 1.25 • \y\max- 
Table [2] summarizes the results for a sampling rate of M = N/A and different amounts d of corrupted 
measurements. In the ABCS algorithm, the parameter i) is set to 0.08 and to 0.05 for 10% and, respectively 
20%, of corrupted measurements. To compare our results achieved with the adaptively learned operator, we 
used the same setting of the reconstruction scheme with a fixed finite difference operator denoted as TV in 
Tabled 

Both experiments confirm that the adaptively learned operator leads to an accuracy improvement com- 
pared to the reconstruction quality obtained with a fixed finite difference operator. In particular, the 
structures in the Barbara and Texture image are better preserved by ABCS. 

5 Conclusion 

In this article we proposed an analysis based blind compressive sensing algorithm that simultaneously recon- 
structs an image from compressively sensed data and learns an appropriate analysis operator. This process 

^Image 1.5. 03. tiff obtained from the USC-SIPI Image Database: http: //sipi.usc.edu/database/| and cropped to (256 X 256) 
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Table 2: Image reconstruction from measurements corrupted by impulsive noise. Achieved PSNR in decibels and MSSIM. The 
values in brackets correspond to the amount of corrupted measurements. 



Aiol.llcKl 


Girl 


Barbara 


Texture 


I'SNR 


MSSIM 


I'SNR 


MSSIM 


I'SNIi 


MSSIM 


ABCS (10%) 


31.82 


0.784 


28.78 


0.827 


26.66 


0.719 


ABCS (20%) 


30.89 


0.749 


22.47 


0.577 


25.18 


0.617 


TV (10%) 


30.23 


0.727 


22.80 


0.532 


24.13 


0.585 


TV (20%) 


29.80 


0.708 


22.43 


0.510 


23.35 


0.537 



is formulated as an optimization problem, which is taekled via a geometric conjugate gradient approach that 
updates both the operator and the image as a whole at each iteration. Furthermore, the algorithm can be 
easily adapted to different noise models by simply exchanging the data fidelity term. 
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